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Wild soybean, the progenitor of cultivated soybean, is an important gene pool for ongoing soybean breeding 
efforts. To identify yield- enhancing quantitative trait locus (QTL) or gene from wild soybean, 113 wild soy- 
beans accessions were phenotyped for five yield-related traits and genotyped with 85 simple sequence repeat 
(SSR) markers to conduct association mapping. A total of 892 alleles were detected for the 85 SSR markers, 
with an average 10.49 alleles; the corresponding PIC values ranged from 0.07 to 0.92, with an average 0.73. 
The genetic diversity of each SSR marker ranged from 0.07 to 0.93, with an average 0.75. A total of 18 SSR 
markers were identified for the five traits. Two SSR markers, sct OlO and satt316, which are associated with 
the yield per plant were stably expressed over two years at two experimental locations. Our results suggested 
that association mapping can be an effective approach for identifying QTL Irom wild soybean. 
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Introduction 

The wild relatives of crops have been undeniably beneficial 
to modern agriculture, providing plant breeders with a broad 
pool of potentially useful genetic resources (Hajjar and 
Hodgkin 2007). However, these wild relatives are consis- 
tently ignored for yield improvement because, in general, 
they have smaller seed sizes, greater tendency to shatter 
and other undesirable traits. Nevertheless, there are an in- 
creasing number of cases of high-yielding derivatives of 
hybrids that have been created through the use of wild rela- 
tives, including tomato, wheat, rice, oat, barley, sorghum, 
maize and soybean (Frey et al. 1984, Fu et al. 2010, Kan et 
al. 2012, Li et al. 2008, Reeves and Bockholdt 1964, Rick 
1974, Tanksley and McCouch 1997, Xiao et al. 1996), 
which indicate that a crop's wild relatives can be used as a 
gene resource to improve the yield of cultivated crops 
through traditional breeding or molecular marker-assisted 
selection. In China, two introgressions from a wild relative 
of rice have been associated with a 30% increase in the 
yields of the world's highest yielding hybrid rice (Deng et 
al. 2004). For tomato, yield increases of greater than 50% 
have resulted from pyramiding three independent, yield- 
promoting genomic segments from a wild relative (Gur 
and Zamir 2004). Nevertheless, we have learned little about 
the chromosomal regions that contribute to yield increases. 
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or the genetic bases of these traits (Swamy et al. 2008). 
Through the implementation of linkage-based QTL map- 
ping and linkage disequilibrium (LD)-based association map- 
ping in crop genetics, it is possible to locate the genomic 
regions that contribute to yield-related traits, clone the gene/ 
QTL from a wild relative and used this information to im- 
prove cultivated crops. 

Cultivated soybean or soybean landraces are consistently 
selected as germplasm to improve soybean yield and link- 
age mapping is the main method for investigating the genet- 
ic basis of yield. However, the genetic diversity of cultivat- 
ed soybean was lost through artificial selection. Wild 
soybean, which possesses high genetic diversity compared 
to cultivated soybean, can be as a gemplasm to improve 
cultivated soybean and provide plant breeders with a broad 
pool of potentially useful genetic resources. Tanksley and 
McCouch (1997) noted to the potential role of genome map- 
ping to efficiently utilize the genetic diversity of wild rela- 
tives and suggested that the continued samphng of wild 
germplasm would result in new gene discoveries and utili- 
zation. 

The limited research on wild soybean mainly focuses on 
biotic or abiotic stress. However, only a few studies on yield 
in wild soybean have suggested that wild soybean can be 
used as the germplasm to improve soybean yield traits and 
some favorable alleles have been identified in wild soy- 
beans. Concibido et al. (2003) mapped a QTL from wild 
soybean PI407305 using BC2, which was derived from a 
cross between cultivated soybean (HS-1) and wild soybean 
(PI407305). Wang et al. (2004) mapped eight QTL for yield 
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using BC2F4, which was derived from a cross between culti- 
vated soybean (IA2008) and wild soybean (PI468916) and 
four favorable alleles were found in wild soybean. Li et al. 
(2008) mapped a QTL closely linked to a SSR marker 
(sattSll) from wild soybean in three environments using 
BC2F4, which was derived from a cross between cultivated 
soybean (7499) and wild soybean (PI245331). It was also 
found that the additive effect of wild soybean can increase 
the yield from 191 kg ha ' to 235 kg ha '. Wen et al. (2008) 
conducted association mapping for agronomic and quality 
traits in wild and cultivated soybean, respectively and they 
found some association only detected in wild soybean popu- 
lation. Kan et al. (2012) mapped two QTL for pod number 
per plant and one QTL for yield per plant from wild soybean 
over two years. All of these results suggest that wild soy- 
bean contains yield-favorable alleles and that it is feasible to 
identify a favorable allele for yield in wild soybean. Wild 
soybean provides a large variation of naturally occurring al- 
leles for QTL mapping and using in soybean improvement 
(lyer-Pascuzzi et al. 2007) and many useful new alleles for 
yield-related traits can be mined from wild soybean. As ad- 
ditional QTL for yield are identified from different wild ac- 
cessions, it will become clear whether all of the accessions 
or only a few wild accessions that are distant from cultivars 
have yield-enhancing QTL by linkage or association map- 
ping. 

In this study, the variation of these yield-related traits 
(i.e., days from sowing to flowering, days from sowing to 
mature, 100-seed weight, pod number per plant and yield 
per plant) in wild soybean from China was analyzed. And 
association mapping was conducted for five yield-related 
traits to detect yield-favorable QTL in wild soybean. Based 
on the MLM model {Q+K), a total of 45 marker-trait asso- 
ciations were identified for the five yield-related traits, in- 
volving 18 SSR markers. 

Materials and Methods 

Plant materials and phenotyping 

A total of 113 wild soybean accessions, representing the 
full geographic range of wild soybean from southern China 
to northeast China, were selected to construct the associa- 
tion mapping population (Supplemental Table 1). The ex- 
periments were conducted at the Jiangpu Agronomic Exper- 
imental Station of Nanjing Agricultural University (32°12TSf 
118°37'48"E), Nanjing, China, in the summers of 2011 and 
2012 and at the Nanyang Experimental Station at Henan 
Agricultural University (38°7'N 110°34'E), Nanyang, China, 
in the summer of 2012. The accessions were planted in a 
complete randomized block design, with 100 cm x 100 cm 
hill plots, 4 plants per plot and 2 replications. Five yield- 
related traits were evaluated: the days from planting to 
flowering (DTF), days from planting to maturity (DTM) 
(without the data from the Nanyang Experimental Station), 
pod number per plant (PN), 100 seed weight (HSW) and 
yield per plant (YLD). 



SSR genotyping 

Genomic DNA from all of the materials was extracted 
from the young leaves of each accession as described by 
Doyle and Doyle (1990), with slight modifications. A total 
of 85 SSR markers representing 19 soybean chromosomes 
were selected from published genetic maps (Hwang et al. 
2009, Song et al. 2004) to genotype the 113 wild soybean 
accessions and the genetic position of the SSR were refer- 
enced the genetic maps that was constructed by Song et al. 
(2004) and the genetic maps constructed by Hwang et al. 
(2009). The PCR amplification was performed in a 10-|a,l 
volume containing 20 ng total DNA, 0.4 |a,M forward and 
reverse primers, 200 |j,M of each dNTP, 19-|a,l PCR buffer 
(10 mM Tris-HCl, pH 8.3 and 50 mM KCl), 2 mM MgClj 
and 0.5 U Taq DNA polymerase. The PCR was programmed 
with an initial denaturing at 94°C for 5 min, followed by 35 
cycles of 95°C for 30 s, 54°C for 1 min and 72°C for 1 min, 
with a final extension at 72°C for 10 min. The PCR reac- 
tions were performed using an MJ Research PTC 225 DNA 
engine thermal cycler (Bio-RAD, USA). The PCR products 
were separated by 8% non-denaturing polyacrylamide gel 
electrophoresis with a 29 : 1 ratio of acrylamide : bisacryl- 
amide and then silver-stained, as described by Santos et al. 
(1993). The stained bands were analyzed based on their 
migration distance relative to the pBR322 DNA Marker 
(Fermentas) using Quantity One v.4.4.0 software 4.4 (Bio- 
Rad, Hercules, CA, USA). 

Statistical analysis 

Phenotype: The data analysis was performed using the R 
statistic language (R Development Core Team 2010). Anal- 
ysis of variance (ANOVA) of all phenotypic data based on the 
means of traits of each accession three envirormients was 
conducted as model: Phen = genotypes + years + locates + 
e. where phen was the phenotypic observation, genotypes 
was the genetic effect, years was the effect of the different 
years, locates was the effect of the different experiment 
place, and e was the residual. The best linear unbiased pre- 
dictor (BLUP) values for each line were calculated using 
the lme4 package (Bates et al. 201 1). Heritability was calcu- 
lated as the genotypic variance divided by the total variance. 
The spearman rank correlation coefficient between each 
pair of traits was calculated based on the BLUP using the 
"cor" function. The effect of population structure on the 
phenotype was assessed based on the BLUP value for each 
accession using the GLM procedure in SAS 8.02 (SAS In- 
stitute 1999). 

Genotypic data analysis: The number of alleles, gene di- 
versity and polymorphic information content (PIC) were 
calculated using Powermarker version 3.25 (Liu and Muse 
2005). Additionally, Nei's genetic distance (1973) among 
the individuals was calculated using Powemaker version 
3.25 (Liu and Muse 2005) and was then used to construct a 
neighbor-joining (NJ) phylogenetic tree with 1000 boot- 
strapping runs using Powermaker version 3.25 (Liu and 
Muse 2005). The tree was visualized using MEGA version 
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4.0 (Tamura et al. 2007). 

Population structure: The Bayesian model-based pro- 
gram STRUCTURE 2.2 (Pritchard et al. 2000) was used to 
infer the population structure using 74 SSR markers, which 
were selected to represent 20 chromosomes. The bum-in peri- 
od was 100 000 and the number of iterations was 100 000 
using a model that allowed for admixture and correlated al- 
lele frequencies. The number of subpopulations {K) was set 
from 1 to 10, with 7 independent runs for each K. The most 
likely number of subpopulations was then determined using 
the Delta K method proprsed by Evanno et al (2005). A 
kinship matrix was calculated using SPAGeDi software 
(Hardy and Vekemans 2002). All of the negative kinship 
values between the individuals were set to zero, according 
to Yu et al. (2006). 

Linkage disequilibrium calculation: The level of LD 
between pairs of SSRs was calculated using the software 
TASSEL V2.1 (Bradbury et al. 2007). LD was measured 
for each pair of loci using Z)' and the significance (P- value) 
for each SSR pair was detemined with 1000 permutations. 

Association mapping: The association between the phe- 
notypes and markers was evaluated with general linear 
model {Q) and mixed linear model {Q+K) that was imple- 
mented in Tassel V2.1 software (Bradbury et al. 2007, Yu et 
al. 2006). In this model, we tested the marker association 
between the phenotype and SSR markers, with g as a fixed 
covariate and kinship {K) as a random effect. The markers 
were identified as significantly associated with traits using a 
threshold of -Log(P- value) > 2.00. 

The phenotypic allele effect of SSR that associated with 
five traits was estimated through comparison between the 
average phenotypic value over accessions with the specific 
allele and the of all accessions based on the BLUP value as: 
a, = Xx//n, - ZX/n, where a, representing the phenotypic 
effect of /th allele; x,y representing the phenotypic value of 
the yth material with the /th allele; n, representing the num- 
ber of materials with the /th allele; X^/n representing the 
mean of the phenotypic value of all materials. If a, > 0, it is 



supposed to be a positive allele, if a, < 0, it corresponds to 
be a negative allele. 

If not otherwise noted, all of the analyses were per- 
formed using the statistical software R (R, Development 
core Team, Vienna, Austria 2010). 

Results 

Genetic diversity and population structure 

By genotyping the 113 wild soybean accessions with 85 
SSR markers, we detected a total of 892 alleles, ranging 
from 2 to 23 alleles per SSR marker, with an average of 
10.49 alleles per locus. The corresponding PIC values 
ranged from 0.07 to 0.92, with an average of 0.73. The ge- 
netic diversity at each SSR marker ranged from 0.07 to 0.93 
with an average of 0.75 (Supplemental Table 2). 

The genetic relationships among the accessions were in- 
vestigated using a model-based Bayesian clustering method 
with 74 SSR markers. Four subpopulations were detected 
by STRUCTURE, which is based on a Bayesian approach 
(Fig. lA). The first, second, third and fourth subpopulations 
contained 17, 13, 41 and 22 accessions (Fig. IB), respec- 
tively. The information of Unrooted neighbor-joining tree of 
113 wild soybean accessions, as based on Nei's 1973 ge- 
netic distance was consistent with the results from STRUC- 
TURE (Fig. 2). 

The relative kinship estimates based on the 74 SSR data 
indicated that 80.34% of the pairwise kinship estimates 
were within the range of 0 to 0.05, the remaining estimates 
ranged from 0.05 to 0.71, with a continuously decreasing 
number of pairs filling in the higher estimate categories 
(Fig. 3). The kinship analysis revealed that the majority of 
the accessions had a null or weak relationship with the other 
accessions in this population. 

Phenotypic variance and correlation 

ANOVA revealed that there were significant differences 
among the accessions (P < 0.01) for five yield-related traits. 




Fig. 1. A. calculation of the true of 1 13 wild soybean accessions, according to Evanno et al. (2005) and B. the population structure of 1 13 wild 
soybean accessions, as based on the 74 SSR loci. Each individual is represented by a single vertical line divided into four colored segments, with 
lengths proportional to each of the four clusters, and the color proportional in a single vertical line indicated that the proportional of a line belongs 
to a subpopulation. The number under vertical line represents the accession corresponding to Supplemental Table 1 . 
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Fig. 2. Unrooted neighbor-joining tree of 113 wild soybean acces- 
sions, as based on Nei's 1973 genetic distance. The color of the lines 
show the subpopulation they belong to based on the Fig. IB. The 
number of each line represents the accessions corresponding to Sup- 
plemental Table 1 . 



indicating a large amount of genetic variation in the popula- 
tion (Supplemental Fig. 1). The effect of the years and loca- 
tion on the five traits was significant (Table 1). The herita- 
bility of the five traits ranged from 39.57% for PN to 
97.84% for DTF. The population structure had a strong in- 
fluence on DTF (32.28%) and DTM (28.58%), with P- 
values < 0.0001. A significant effect was detected for PN 
(13.25%), with a P-value < 0.0038. No significant effects 
were detected for HSW and YLD with P-value 0.53 and 
0.29, respectively (Table 1). 

Pearson correlation coefficients between traits based on 
the BLUP value were calculated and there was a significant 
negative correlation between HSW and PN (r = -0.44, 



90 H 




Relative kinship 



Fig. 3. Distribution of pair-wise kinship coefficients for 113 wild 
soybean accessions. The values are from SPAGeDi estimates using 74 
SSR markers. 



P < 0.01). Conversely, there were significant positive corre- 
lations between HSW and YLD (r = 0.53, P<0.01), DTF 
and PN (r = 0.51, P<0.01), DTM and YLD (r = 0.26, 
P < 0.01), DTM and PN (r = 0.34, P < 0.01) and DTF and 
DTM (r = 0.91, P < 0.01) (Table 2). 

LD and association mapping 

The LD pattern was assessed based on the 2279 pairwise 
combinations of the 85 SSR loci. Based on the D' estimates, 
15.05%) had a significant LD at P < 0.05 and D' ranged from 
0.0038 to 1 with an average of 0.38. 

Based on the MLM model (Q+K), a total of 45 marker- 
trait associations were identified for the five yield-related 
traits, involving 18 SSR markers (Table 3 and Fig. 4). Among 
the 45 significant associations, seven were correlated with 



Table 1. Descriptive statistics, ANOVA and broad-sense heritability for five traits 



Traits 


E 


Mean + SD 


Max 


Min 


Year 


Loc 


Gen 


h- 




DTF 


2011NJ 


55.61 ± 12.92 


84.00 


27.00 




** 




97.84 


32.28 




2012NJ 


57.14 ± 12.58 


84.50 


26.00 














2012NY 


60.55 ± 12.98 


92.00 


35.50 












DTM 


2011NJ 


105.70 ± 12.76 


138.33 


81.28 




Na 




96.30 


28.58 




2012NJ 


102.61 ± 11.39 


131.00 


78.50 












PN 


2011NJ 


268.43 ± 111.42 


268.43 


93.40 








39.57 


13.25 




2012NJ 


220.19 ±80.65 


419.58 


50.33 














2012NY 


736.21 ±362.08 


1621.5 


124.13 












HSW 


2011NJ 


2.58 ± 1.55 


10.41 


0.90 




** 


** 


90.95 


2.85 




2012NJ 


2.39 ± 1.44 


9.74 


0.90 














2012NY 


2.19± 1.50 


8.48 


0.66 












YLD 


2011NJ 


15.18 ±7.34 


44.65 


3.62 


** 


** 


** 


90.31 


4.36 




2012NJ 


16.37 ±9.53 


54.35 


3.19 














2012NY 


17.07 ±8.28 


43.38 


1.66 













E, Environments; SD, Standard deviation; Na: No infonnation; h^: Heritabity; R-^: Phenotypic variance explained by the population stmcture; 
** Significant at P< 0.01. 
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DTF, ten with DTM, three with PN, twelve with HSW, 
thirteen with YLD. 

The 7 SSR-trait associations related to the DTF involved 
3 SSR loci; two SSR loci at satt322 on chromosome 6 and 
satt564 on chromosome 18 were identified under the BLUP 
value and in two experimental stations in 2012. The ten 
SSR-trait associations related to DTM involved 5 SSR loci; 
two SSR loci at sat 304 on chromosome 7 and satt521 on 



Table 3. SSR loci significantly associated with five traits and the significance (-Log(P-value)) 



Trait 


Loci 


L.nr. 


Position 
(cM) 


2011NJ 


2012NJ 


2012NY 


BLUP 




GLM MLM 


GLM MLM 


GLM MLM 


GLM 


MLM 


U 1 r 


c; off /ins 


1 
1 


1 (\fx fxQ 




7 A7 
Z.U / 




9 Al 
Z.U 1 






satt521 


■7 

J 


f,<. Af, 


7 07 7 14 
Z.VZ Z.J'i- 




Z.J J 


9 ^ 1 

Z.J 1 






saiijzz 


£. 
O 


87 7T 
oZ.ZJ 


7 71 
Z.Zj 


7 A*; 7 77 

Z.Uj Z.ZZ 


9 d7 9 18 
Z.4/ Z.JO 


9 17 
Z.J / 


9 99 
Z.ZZ 




SdllOjO 


7 


J.UU 


7 A/=; 
z.uo 












cQ+f 1 

SdlLl jU 


7 


1 8 '^8 
i O. JO 


7 AO 




9 1 n 

Z. iU 








SdLLH- i / 


Q 


Afx Id 


7 OA 


9 no 


9 09 
z.uz 


9 1 A 

Z. iU 






SdLLH-U J 


1 fx 
1 D 


1 7 dl 




9 ns 

z.uo 










SdlLZo3 


1 fx 


7^ '^1 

Zj. J 1 


7 Ad 


9 19 


9 77 

Z. / / 


9 dl 
Z.4j 








1 R 
i o 


87 


1 AO 
j.\J\J 




1 Id 
J. IH- 


9 89 

Z.oZ 






adLLJOH- 


1 8 
i o 


^7 17 

J / . JZ 


1 Ad 


d 17 1 99 

H-.J / J.ZZ 


d 0/^ 9 dl 


1 77 
J. / / 


9 dl 




cMff^^ 1 A 
SdLLO I't 


7A 


1 

J 1 .yt 




9 AS 
Z.Uo 








L) i IVl 




1 
1 


1 (\fx fxQ 




9 AA 










SdU04 i 


J 


70 78 
ZV.Zo 


7 1 A 
Z. lU 






9 AA 
Z.UO 






satt521 




fx'^ Afx 


< fi^^ A 17 


d d4 9 8A 




19 

J. jZ 


1 nn 




sdl jKjH 


J 


77 1 A 


7 77 7 1^ 
Z.ZZ Z.lJ 


1 79 9 '^Q 
J. /Z L.jy 




9 QA 

z.yo 


9 8A 
Z.oU 




cuiil 1 Si 
SdLL / i o 


A 
H 


7"^ 70 

1 J. ly 




9 91 9 19 
Z.Zj Z.iZ 










SLLLU i U 


1 R 
i o 


87 


1 fixf) 
J .U\J 


9 79 
Z. /Z 




1 1 9 
J. 1 z 






SdLLJUH- 


1 8 
i o 


^7 17 

J / . jZ 


7 17 
Z. JZ 


9 10 9 11 
Z.j" Z.ll 




9 AA 
Z.44 


9 AA 
Z.UU 




cdtf^l A 
adLLDi'+ 


90 
zu 


1 1 Qd 


9 Id 
Z. jH- 






9 1 1 

Z. J i 


Z.UJ 


no w 




z 


lA 7d 


9 9A 
z.zo 












Cdtff^-d. 1 

adLLDH- i 


'I 


70 78 
Z7.Z0 


^ AA 1 8d 


7 A8 d '=18 

/ .Uo H-.Jo 


9 91 
z.z 1 


d 9^ 

H-.Z J 


9 d7 




caH^O 1 
SdLLJZ i 






9 91 
z.z 1 


9 11 
Z.J 1 










SdL i J / 


J 


n AA 






9 7d 
Z. /'+ 








SdUZZ / 


f, 

0 


Ifx fix's 


1 11 


9 11 
Z.J 1 


9 Id 
Z. 14 


1 fxA 
J .04 






c rift's 1 

sail J 1 o 


0 


1 77 fixl 

IZ /.o / 




9 /^fi 9 AC 
Z.Do Z.Uo 


9 1 Q 

z. ly 








SdLli Jyj 


7 
1 


1 8 ^8 
i 0. Jo 






4. JO j.jU 








saiiz lu 


7 


1 1 7 AS 
1 iZ.Uo 


7 /^7 
Z.OZ 


9 /^n 

Z.OU 


9 AO 

z.uy 


9 79 






SdLl'H-H-J 


1 n 
1 u 


70 A'i. 


7 7A 
Z. /U 




9 dl 
Z.4j 


9 dO 
Z.47 








1 n 

1 u 


8^^ 8/^ 
oO.oO 




9 A7 
Z.U / 










SdLlZ J 1 




1/^ 8A 


7 1 8 
Z. i 0 


9 ^7 

Z.J / 










SdL jJ'i- 


1 7 
1 Z 


Al 


7 ^0 

z. jy 


9 89 9 1 /=i 
Z.oZ Z.IO 


9 1 1 

Z.J 1 


9 89 
Z.oZ 


9 1 9 
Z.iZ 




satt5 1 6 


13 


44.42 


3.33 2.43 


3.91 2.77 










sct_188 


13 


85.33 




2.37 


2.24 


2.17 






satt706 


15 


43.36 






2.06 








satt285 


16 


25.51 


2.19 3.29 


2.39 










satt564 


18 


57.32 




2.70 


3.93 


2.96 






set 010 


19 


59.52 


2.24 


2.68 




2.12 






GMES2079 


19 


90.10 


2.12 


2.21 










GMES4376 


19 


91.10 


2.41 


2.70 


2.60 


2.82 




PN 


satt342 


1 


48.14 


2.34 


2.38 










satt641 


3 


29.28 






2.30 


2.19 






sat_137 


5 


0.00 






2.30 








satt385 


5 


64.74 






2.49 


2.21 






satt322 


6 


82.23 


2.33 


2.04 










satt389 


17 


79.23 


2.35 


2.11 








YLD 


satt641 


3 


29.28 


3.45 2.96 


2.36 




2.92 






satt521 


3 


65.46 


2.04 


3.22 


4.49 4.49 


3.75 


2.28 




satt3 1 6 


6 


127.67 


3.10 3.76 


2.48 2.35 


2.07 2.10 


3.06 


3.51 




sat 330 


7 


140.69 


3.22 3.24 








2.00 




set 010 


19 


59.52 


5.51 5.06 


4.28 3.24 


2.64 2.15 


5.59 


4.47 




GMES4376 


19 


91.10 




2.37 









NJ, Nanjing; NY, Nanyang. 



Table 2. Correlation coefficients among five traits, as based on BLUP 
values 



Trait 


DTF 


DTM 


YLD 


PN 


DTM 


0.91** 








YLD 


0.10 


0.26** 






PN 


0.51** 


0.34** 


0.10 




HSW 


-0.18 


0.11 


0.53** 


-0.44** 



**i^<0.01. 
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Chrl 



Chr2 



Chr3 



Chr6 



Chr9 



48.1 • 
51.1 - 



70.7- 



-satt184 



50.1 ■ 



-BE475343 Bg 29.3- 



-AI856415 



-satt342 
-satt548 



Chr4 



52.1- 
55.4- 

65.1- 

73.8- 

80.6- 



- GMES4743 

- GMES6071 

- satt57a 

|-|D 

-satt718 Llg 

- satt476 



6S.5- 
68.3- 
70.6- 
77.1- 
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Fig. 4. Soybean simple sequence repeat (SSR) genetic linkage map showing the marker positions and estimated map distances (cM; indicated on 
the left of the vertical bars) based on the consensus linkage map of Song et al. (2004). Markers associated with any of five yield-related traits are 
indicated by red characters. Black character SSR markers indicated no association to any of the five yield-related traits in this study, DTF: days to 
flowering from sow, DTM: days to mature from sow, HSW: hundred seed weight, PN; pod number per plant, YLD: yield per plant. 
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chromosome 6 were identified over two years and under the 
BLUP value. The 12 SSR-trait associations related to HSW 
involved seven SSR markers; three SSR loci at satt641 
satt285 and satt516 were detected over two years in 
Nanjing, but only satt64 1 was detected under the BLUP val- 
ue. Four SSR-traits associations related to PN, but all of that 
were detected only in one environment. The 13 SSR-trait 
associations related to YLD involved 5 SSR markers; two 
SSR loci at sct OlO on chromosome 19 and satt3 16 on chro- 
mosome 6 were identified under any condition. And, five 
out of the 18 SSR markers were associated with two or 
more traits. 

Mining of the elite alleles 

The phenotypic allele effect of each SSR that significant- 
ly associated with five yield-related traits was shown in 
Supplemental Table 3. Among the alleles associated with 
DTF, satt52 1-204 had the most positive phenotypic effect 
and able to increase DTF by 18.03 days, whereas satt564- 
191 had the most negative phenotypic effect (-18.31 days). 
Among the alleles associated with DTM, satt52 1-204 and 
sat_304-178 had the most positive phenotypic effect and 
able to increase DTM by 28.96 days, whereas satt564-191 
and sat_304-162 had the most negative phenotypic effect 
(-21.08 days). Among the alleles associated with HSW, 
sattS 16-277 had the most positive phenotypic effect and 
able to increase HSW by 4.39 g, whereas satt285-275 had 
the most negative phenotypic effect (-2.04 g). Among the 
alleles associated with YLD, satt3 16-277 had the most posi- 
tive phenotypic effect and able to increase HSW by 22.8 g, 
whereas sat_330-399 had the most negative phenotypic 
effect (-6.28 g). Among the alleles associated with PN, 
satt342-272 had the most positive phenotypic effect and 
able to increase PN by 95.10, whereas satt342-220 had the 
most negative phenotypic effect (-64.82). 

Discussion 

Genetic diversity and population structure 

Increasing the yield of soybean is a major target for soy- 
bean breeders. Indeed, there is much concern regarding the 
reduction of the diversity of the currently cultivated soy- 
beans. Because early farmers used only a limited number of 
individual progenitors in the domestication process, only 
the seeds from the best plants were utilized to form the next 
generation, which led to a loss of genetic diversity (Doebley 
et al. 2006). After domestication, the genetic variation in 
soybean has been continually reduced by modem plant 
breeding. In present study, the average number of alleles per 
loci was 10.49, which is low compared to the 17.8 that was 
detected by Wen et al. (2009), but higher than the 5.6 alleles 
per number that was reported by Wang et al. (2010), per- 
haps due to differences in the samples, sample size and SSR 
markers that were selected. In the present study, the 9 EST- 
SSRs that were selected have very a low number of alleles, 
ranging from 2 to 6, with an average of 3.5 per locus and the 



marker that is selected will affect the result of the analysis 
of the allele number and genetic diversity. 

Population structure can lead to the discovery of many 
false-positive QTL (Zhao et al. 2007) and several models 
have been developed to resolve the complication, including 
genomic control, Q+K model, PCA model (Devlin and 
Roeder 1999, Devlin et al. 2004, Price et al. 2006, Yu et al. 
2006). Previous studies have demonstrated that the Q+K 
method is the most powerful method for perfoming associ- 
ation mapping (Stich and Melchinger 2009, Yu et al. 2006, 
Zhao et al. 2007). In the present study, four subpopulations 
were identified using a STRUCTURE analysis based on the 
Bayesian model and the population structure has different 
effect on different trait (Table 1). 

Association mapping and potential usages of the results in 
soybean breeding 

Based on the GLM model {Q), a total of 1 18 marker-trait 
associations were identified for the five yield-related traits, 
involving 33 SSR markers (Table 3). Based on the MLM 
model {Q+K), forty-five SSR-trait associations were iden- 
tified, involving 18 SSR markers. Except BE475343 associ- 
ated with HSW on Chr2 in 2011NJ and satt285 associated 
with HSW on Chrl6 in 2012NJ, all the SSR-trait associa- 
tions identified by MLM were detected by GLM, but many 
of SSR-trait associations identified by GLM were not de- 
tected by MLM, which may be resulted from the effect of 
kinship. 

Of these 18 SSR markers identified by MLM, five SSR 
markers were associated with two or more traits. Sat_334 on 
chromosome 12, associated with HSW, was close to satt442, 
which has been reported to be associated with HSW in wild 
soybean (Wen et al. 2008). These two loci that associated 
with HSW detected in this study were also identified in an 
F2 population derived from the crossing of cultivated and 
wild soybeans (Kan et al. 2012). However, to our knowl- 
edge, these QTL have not been identified in cultivated soy- 
bean populations, indicating that may be a QTL which 
involving soybean domestication. Sct OlO on chromosome 
19, which is associated with YLD, has been reported by Kan 
et al. (2012) and the increase allelic from wild soybean. 
Satt316 on chromosome 6 associated with YLD and which 
have been reported by Reinprecht et al. (2006). Sattl50 on 
chromosome 7, which is associated with HSW, has been re- 
ported to have a close linkage with soybean seed size and 
volume (Salas et al. 2006). 

The majority of the loci that were associated with the five 
traits could only be identified in a specific environment, 
either that of the Nanjing Experimental Station or Nanyang 
Experimental Station, which indicated that wild soybean is 
very sensitive to the environment. However, some stable 
associations were identified in our study, such as satt546 
and satt322, which were associated with DTF, sat_304 and 
satt521, which were associated with DTM and sct OlO and 
satt3 16, which were associated with YLD. A low threshold, 
-Log(/'-value) > 2.00, was used to detect the marker-frait 
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association due to the limited number of marker used in 
this study. If high-density DNA polymorphism datasets are 
used for association mapping, additional markers with high 
-Log(P-value) may be obtained. 

In order to use the results of the association analysis, we 
assessed the phenotypic allele effect of each SSR that asso- 
ciated with five yield-related traits and a number of elite al- 
lele was detected associated with five yield-related traits. 
These will be useful for molecular marker assist selection 
and molecular design breeding. 

A small sample and limited markers were used in this 
study and the results need to be confirmed using hnkage 
mapping or a large association population. However, the re- 
sults are credible because many of the loci that were identi- 
fied were associated with traits that were common with pre- 
vious reports of linkage or association mapping. 

Acknowledgments 

This work was supported by the National Basic Research 
Program of China (973 Program) (20 lOCB 125906, 
2009CB118400), the National Natural Science Foundation 
of China (31000718, 31171573, 31201230, 31271749), 
Jiangsu Provincial Programs (BE2012328, BK2012768, 
BE20 12747) and the Young Scholar Innovation Foundation 
of the Nanjing Agricultural University (KJ2011004). 

Literature Cited 

Bates, D., M.Maechler and B.Bolker (2011) Welcome to lme4 — 
Mixed-effects models project. R Foundation for Statistical Com- 
puting, Vienna, Austria. 

Bradbury, P., Z.Zhang, D. Kroon, T. Casstevens, Y. Ramdoss and 
E.S. Buckler (2007) TASSEL: software for association mapping of 
complex traits in diverse samples. Bioinformatics 23: 2633-2635. 

Concibido,V.C., B.LaVallee, P.McLaird, N.Pineda, J.Meyer, 
L. Hummel, J.Yang, K. Wu and X. Delannay (2003) Introgression 
of a quantitative trait locus for yield from Glycine soja into com- 
mercial soybean cultivars. Theor. Appl. Genet. 106: 575-582. 

Deng,Q.Y., L.RYuan, F.S.Liang, J.M.Li, X.Q.Li, L.G.Wang and 
B.Wang (2004) Studies on yield-enhancing genes from wild rice 
and their marker-assisted selection in hybrid rice. Hybrid Rice 19: 
6-10. 

Devlin, B., S.A. Bacanu and K. Roeder (2004) Genomic control to the 
extreme. Nat. Genet. 36: 1129-1130. 

Devlin, B. and K. Roeder (1999) Genomic control for association stud- 
ies. Biometrics 55: 997-1004. 

Doebley, J.F., B.S. Gaut and B.D. Smith (2006) The molecular genetics 
of crop domestication. Cell 127: 1309-1321. 

Doyle, J.J. and J.L.Doyle (1990) Isolation of plant DNA from fresh 
tissue. Focus 12: 13-15. 

Evanno,G., S.Regnaut and J.Goudet (2005) Detecting the number of 
clusters of individuals using the software STRUCTURE: a simula- 
tion study Mol. Ecol. 14: 2611-2620. 

Frey.K.J., T.S.Cox, D.M.Rodgers and RBramel-Cox (1984) Increas- 
ing cereal yields with genes from wild and weedy species. In 
Genetics, new frontiers. In: Chopra, V.L. et al. (eds.) Proceedings 
of the XV International Congress of Genetics, vol. IV. pp. 51-68. 



Fu,Q, RJ. Zhang, L.B.Tan, Z.F.Zhu, D.Ma, Y.C.Fu, X.C.Zhan, 

H.W.Cai and C.Q.Sun (2010) Analysis of QTLs for yield-related 

traits in Yuanjiang common wild rice (Orytza rufipogon Griff). J. 

Genet. Genomic 37: 147-157. 
Gur,A. and D.Zamir (2004) Unused natural variation can lift yield 

barriers in plant breeding. PLoS Biol. 2: e245. 
Hajjar, R. and T. Hodgkin (2007) The use of wild relatives in crop 

improvement: A survey of developments over the last 20 years. 

Euphytica 156: 1-13. 
Hardy, O. and X. Vekemans (2002) SPAGeDi: a versatile computer 

program to analyse spatial genetic structure at the individual or 

population levels. Mol. Ecol. Notes 2: 618-620. 
Hwang.T.Y, T.Sayama, M.Takahashi, Y.Takada, Y.Nakamoto, 

H. Funatsuki, H. Hisano, S.Sasamoto, S.Sato, S.Tabata et al. 

(2009) High-density integrated linkage map based on SSR markers 

in soybean. DNA Res. 16: 213-225. 
Iyer-Pascuzzi,A.S., M.T.Sweeney, N.Sarla and S.R.McCouch (2007) 

Use of naturally occuning alleles for crop improvement. In: 

Upadhyaya,N.M. (ed.) Rice Functional Genomics-challenges, 

Progress and Prospects, Springer Life Sciences, New York, 

pp. 113-143. 

Kan,G.Z., Z.F.Tong, Z.B.Hu, D.Zhang, G.Z.Zhang and D.Y.Yu 
(2012) Mapping QTLs for yield related traits in wild soybean 
{Glycine soja Sieb. and Zucc). Soybean Sci. 31: 333-340. 

Li,D.D., TW.Pfeiffer and PL. Cornelius (2008) Soybean QTL for 
yield and yield components associated with Glycine soja alleles. 
Crop Sci. 48: 571-581. 

Liu, K. and S. Muse (2005) Power Marker: an integrated analysis envi- 
ronment for genetic marker analysis. Bioinformatics 21: 2128- 
2129. 

Nei,M. (1973) The theory and estimation of genetic distance. In: 
Morton, N.E. (ed.) Genetic Structure of Populations, University 
Press of Hawaii, Honolulu, pp. 45-54. 

Price, A.L., N.J.Patterson, R.M.Plenge, M.E. Weinblatt, N.A.Shadick 
and D.Reich (2006) Principal components analysis corrects for 
stratification in genome-wide association studies. Nat. Genet. 38: 
904-909. 

Pritchard, J., M. Stephens and P.Donnelly (2000) Inference of popula- 
tion structure using multilocus genotype data. Genetics 155: 945- 
959. 

R Development Core Team (2010) R: A Language and Environment 
for Statistical Computing R Foundation for Statistical Computing, 
Vienna, Austria. 

Reeves, R.G. and A.J. Bockholt (1964) Modification and improvement 
of maize inbred by crossing it with Tripsacum. Crop Sci. 4: 7-10. 

Reinprecht,Y., V.W.Poysa, K.Yu, I.Rajcan, G.R.Ablett and K.R 
Pauls (2006) Seed and agronomic QTL in low linolenic acid, 
lipoxygenase-free soybean {Glycine max (L.) MeiTill) germplasm. 
Genome 49: 1510-1527. 

Rick,C.M. (1974) High soluble-solids content in large fmited tomato 
lines derived from a wild green-fruited species. Hilgardia 42: 493- 
510. 

SalaSjR, J.C.Oyarzo-Llaipen, D.Wang, K.Chase and L.Mansur 
(2006) Genetic mapping of seed shape in three population of re- 
combinant inbred lines of soybean {Glycine max (L.) Memll). 
Theor. Appl. Genet. 113: 1459-1466. 

Santos, F.R., S.D.Pena and J.T.Epplen (1993) Genetic and population 
study of a Y-linked tetra nucleotide repeat DNA polymorphism 
with a simple non-isotopic technique. Hum. Genet. 90: 655-656. 

Song, Q. J., L.F.Marek, R.C. Shoemaker, K.G.Lark, V.C. Concibido, 
X. Delannay, J.E.Specht and P.B.Cregan (2004) A new integrated 



Association mapping of yield-related traits 



449 



genetic linkage map of the soybean. Theor. Appl. Genet. 109: 122- 
128. 

StichjB. and A.E. Melchinger (2009) Comparison of mixed-model ap- 
proaches for association mapping in rapeseed, potato, sugar beet, 
maize, and Arabidopsis. BMC Genomics 10: 1-14. 

Swamy,B.P. and N. Sarla (2008) Yield-enhancing quantitative trait loci 
(QTLs) from wild species. Biotechnol. Adv. 26: 106-120. 

Tamura,K., J.Dudley, M.Nei and S.Kumar (2007) MEGA4: Molecu- 
lar evolutionary genetics analysis (MEGA) software version 4.0. 
Mol. Biol. Evol. 24: 1596-1599. 

Tanksley, S.D. and S.R. McCouch (1997) Seed banks and molecular 
maps: unlocking genetic potential from the wild. Science 277: 
1063-1066. 

Xiao,J.H., S.Grandillo, S.N.Ahn, S.R.McCouch, S.D.Tanksley, 
J.M.Li and L.P.Yuan (1996) Genes from wild rice improve yield. 
Nature 384: 223-224. 

Wang, D., G.L. Graef, A.M. Procopiuk and B.W. Diers (2004) Identifi- 
cation of putative QTL that underlie yield in interspecific soybean 
backcross populations. Theor. Appl. Genet. 108: 458^67. 

Wang,M., R.Z.Li, W.M.Yang and W.J.Du (2010) Assessing the genet- 



ic diversity of cultivars and wild soybeans using SSR markers. Afr. 
J. Biotechnol. 9: 4857^866 

Wen,Z.X., T.J.Zhao, Y.Z.Zheng, S.H.Liu, C.E.Wang, F.Wang and 
J.Y. Gai (2008) Association analysis of agronomic and quality traits 
with SSR markers in Glycine max and Glycine so/a in China: I. 
Population structure and associated markers. Acta Agronomica 
Sinica34: 1169-1178. 

Wen,Z.X., Y.L.Ding, TJ.Zhao and J.Y. Gai (2009) Genetic diversity 
and peculiarity of annual wild soybean (G. soja Sieb.et Zucc.) from 
various eco-regions in China. Theor. Appl. Genet. 119: 371-381. 

Yu,J. and E.S. Buckler (2006) Genetic association mapping and ge- 
nome organization of maize. Curr. Opin. Biotech. 172: 155-160. 

Yu, J., G. Pressoir, W. Briggs, I.Bi, M.Yamasaki, J.Doebley, 
M.McMuUen, B.Gaut, D.Nielsen, J.Holland et al. (2006) A uni- 
fied mixed-model method for association mapping that accounts 
for multiple levels of relatedness. Nat. Genet. 38: 203-208. 

Zhao,K., M.J.Aranzana, S.Kim, C. Lister, C.Shindo, C.Tang, 
C.Toomajian, H.Zheng, C.Dean, P. Marjoram et al. (2007) An 
Arabidopsis example of association mapping in structured samples. 
PLoS Genet. 3: 71-82. 



